Medical Image Analysis
○ Elsevier BV
Preprints posted in the last 90 days, ranked by how well they match Medical Image Analysis's content profile, based on 33 papers previously published here. The average preprint has a 0.06% match score for this journal, so anything above that is already an above-average fit.
Xu, R.; Jiang, S.; Zhai, Y.; Chen, Y.
Show abstract
Background: Segmentation of the left ventricular myocardium, left ventricular cavity, and right ventricular cavity on short-axis cine cardiac magnetic resonance (CMR) images is essential for quantifying cardiac structure and function. However, existing automated segmentation tools are limited by small training datasets, narrow disease coverage, restrictive input format requirements, and the absence of anatomical plausibility constraints, hindering their clinical adoption. Methods: We constructed the largest annotated CMR short-axis segmentation dataset to date, comprising 1,555 subjects from 12 centers with five cardiac disease types and full cardiac cycle annotations totaling 319,175 labeled images. A MedNeXt-L model was trained using a 2D slice-by-slice strategy with full field-of-view input, eliminating dependencies on 3D volumes, temporal sequences, or region-of-interest(ROI) localization. A deterministic three-step post-processing pipeline was designed to enforce anatomical priors: connected component constraint, containment relationship constraint, and gap-filling constraint. The model was validated on an internal test set (310 subjects) and three independent public external datasets (ACDC, M&Ms1, M and Ms2; 855 subjects from 6 additional centers across 3 countries), spanning 15 cardiac disease categories-10 of which were never encountered during training. Results: The model achieved mean Dice similarity coefficients (DSC) of 0.913 {+/-} 0.037 and 0.911 {+/-} 0.040 on internal and external test sets, respectively, with a cross-domain performance gap of only 0.002. Post-processing eliminated all containment violations (7.5% [->] 0%) and gap errors (1.8% [->] 0%) while reducing fragment rates by 85.5% (9.0% [->] 1.3%). Zero-shot generalization to 10 unseen disease categories yielded DSC values ranging from 0.899 to 0.921. Automated clinical functional parameters demonstrated excellent agreement with manual measurements for left ventricular indices and right ventricular volumes (intraclass correlation coefficients [≥] 0.977). Conclusions: CorSeg-CineSAX provides a robust, open-source framework for fully automatic CMR short-axis segmentation across diverse clinical scenarios. All source code and pre-trained weights are publicly available at https://github.com/RunhaoXu2003/CorSeg.
Zhou, M.; Zhang, M.; Wang, J.; Shao, C.; Yan, G.
Show abstract
Cardiovascular disease is one of the leading causes of death worldwide, with myocardial infarction (MI) being a major cause of both morbidity and mortality among cardiovascular patients. MI Patients face a higher risk of cardiovascular disease recurrence afterwards. Therefore, accurately predicting the risk of recurrence and identifying key risk factors are crucial for clinical decision-making. In this paper, we consider the interrelationships among cardiovascular factors from a systemic perspective. We first construct a differential network for each patient to capture individual-specific deviations in factor relationships and propose a novel method, termed Causal Factor-aware Graph Neural Network (CFGNN), which integrates factor interactions to predict the recurrence risk of MI patients while uncovering key risk factors from a causal perspective. Experimental results demonstrate that CFGNN performs well on hospital-derived datasets in real world, effectively identifying several key risk factors. This method not only deepens our understanding of cardiovascular disease, but also paves the way for more targeted and effective interventions.
Kritopoulos, G.; Neofotistos, G.; Barmparis, G. D.; Tsironis, G. P.
Show abstract
Class imbalance in clinical electrocardiogram (ECG) datasets limits the diagnostic sensitivity of automated arrhythmia classifiers, particularly for rare but clinically significant beat types. We propose a three-stage hybrid generative pipeline that combines a spectral-guided conditional Variational Autoencoder (cVAE), a class-conditional latent Denoising Diffusion Probabilistic Model (DDPM), and a Quantum Latent Refinement (QLR) module built on parameterized quantum circuits to augment minority arrhythmia classes in the MIT-BIH Arrhythmia Database. The QLR module applies a bounded residual correction guided by Maximum Mean Discrepancy minimization to align synthetic latent distributions with real class-specific latent banks. A lightweight 1D MobileNetV2 classifier evaluated over five independent random seeds and four augmentation ratios serves as the downstream benchmark. Our findings establish latent diffusion augmentation as an effective strategy for imbalanced ECG classification and motivate further investigation of quantum-classical hybrid methods in cardiac diagnostics.
Shi, M.; Zheng, H.; Gottumukkala, R.; Jonathan, N.; Armstong, G. W.; Shen, L. Q.; Wang, M.
Show abstract
Early screening for glaucoma and diabetic retinopathy (DR) is critical to prevent irreversible vision loss, yet remains inaccessible to many underserved populations. However, AI models trained on hospital-grade fundus images often generalize poorly to low-cost images acquired with portable devices such as smartphones. We proposed CausalFund, a causality-inspired learning framework for training AI models that enable reliable low-resource screening from easily acquired non-clinical images. CausalFund disentangles disease-relevant retinal features from spurious image factors to achieve domain-generalizable screening across clinical and non-clinical settings. We integrated CausalFund with seven deep learning backbones for glaucoma and DR screening from portable-device fundus images, including lightweight architectures suitable for on-device deployment. Across diverse experimental settings and image quality conditions, CausalFund consistently improved AUC and achieved a more favorable sensitivity-specificity trade-off than conventional deep learning baselines. As a model-agnostic framework, CausalFund could be extended to other diseases and low-resourced scenarios characterized by degraded or non-standard imaging.
Arian, R.; Allen, E.; Tyler, M.; Kafieh, R.
Show abstract
Regular optical coherence tomography (OCT) monitoring is essential for early detection of retinal disease and timely intervention, but frequent clinicbased imaging burdens patients and healthcare systems. Home-based OCT enables continuous monitoring and reduces clinic visits; however, compact optics and patient-operated acquisition introduce noise, reduced resolution, motion blur, and artifacts that limit clinical reliability and diagnostic confidence. To model home-based OCT acquisition, we employ simulated data reflecting images from Siloton, a compact home-based OCT device. Clinically realistic noise and acquisition artifacts were applied to high-quality OCT images using Silotons simulation software, generating near-real patient-operated scans. Building on this dataset, we propose HAGAN, a Hybrid Attention Generative Adversarial Network developed through a progressive strategy, evolving from a baseline U-Net to an adversarial framework with hybrid attention. The best-performing U-Net architecture, EfficientNet-B1, identified through evaluation and ablation studies, is adopted as the generator. The generator incorporates attention gates at its skip connections and self-attention modules within the decoder, and is paired with a VGG19-based discriminator to form the HAGAN architecture. The model is trained using a multiobjective loss combining pixel-wise, structural, perceptual, edge-preserving, and adversarial components. Experiments on simulated home-based OCT data demonstrate that HAGAN consistently outperforms baseline and state-of-the-art models across standard enhancement metrics and a clinically relevant retinal layer segmentation downstream task, improving visual quality and preservation of diagnostically meaningful anatomical structures. These findings support the potential of HAGAN for reliable enhancement in future home-based OCT platforms, enabling remote retinal monitoring and reducing reliance on in-clinic imaging and routine hospital visits. HighlightsO_LIEnhancing the quality of home-based OCT images to support remote retinal monitoring and reduce the need for frequent referrals to clinical imaging centers C_LIO_LIProposing HAGAN, a hybrid attention generative adversarial network for enhancing OCT images acquired using the Siloton home-based OCT device C_LIO_LIHybrid attention design combining attention gates and self-attention to preserve fine retinal details and global anatomical consistency C_LIO_LIAdversarial learning framework improving perceptual realism and preservation of diagnostically relevant retinal structures in low-quality homeacquired OCT images C_LIO_LIProgressive model development from baseline U-Net to hybrid attention GAN, demonstrating systematic and measurable performance improvements C_LIO_LIClinical relevance validated through downstream retinal layer segmentation, confirming preservation of diagnostically important structures C_LI
Li, F.; Li, S.; Qian, Y.; Chen, B.; Brody, J. A.; Yogeswaran, V.; Wiggins, K. L.; Sitlani, C. M.; Bis, J. C.; Shojaie, A.; Longstreth, W. T.; Psaty, B. M.; Tison, G. H.; Du, S.; Floyd, J. S.; Ye, T.
Show abstract
Atrial fibrillation and heart failure impose substantial health burdens worldwide, yet existing prediction models lack sufficient accuracy and generalizability. We developed CARDIAC-FM, a multimodal foundation model that learns joint representations of 12-lead electrocardiogram (ECG) and cardiac magnetic resonance imaging (MRI) through contrastive learning. We trained CARDIAC-FM on 57,609 paired ECG-cardiac MRI samples from UK Biobank and evaluated it in two external cohorts: the Cardiovascular Health Study (CHS) and the Multi-Ethnic Study of Atherosclerosis (MESA). CARDIAC-FM consistently outperformed unimodal models across all cohorts, and jointly incorporating ECG features with established clinical risk scores yielded additive gains in discrimination, indicating that ECG and traditional risk factors capture complementary dimensions of cardiovascular risk. The learned representations improved prediction across a range of cardiovascular outcomes with minimal task-specific fine-tuning, reflecting real-world settings where many diseases have limited positive samples and lack dedicated risk models. Although trained on paired ECG and MRI data, CARDIAC-FM generates predictions using ECG alone or ECG combined with established risk scores, enabling broad clinical deployment without MRI. These findings demonstrate the promise of multimodal pre-training for generalizable cardiovascular risk prediction.
Qiu, P.; An, Z.; Ha, S.; Kumar, S.; Yu, X.; Sotiras, A.
Show abstract
Multimodal medical image analysis exploits complementary information from multiple data sources (e.g., multi contrast Magnetic Resonance Imaging (MRI), Diffusion Tensor Imaging (DTI), and Positron Emission Tomography (PET)) to enhance diagnostic accuracy and support clinical decision making. Central to this process is the learning of robust representations that capture both modality invariant and modality specific features, which can then be leveraged for downstream tasks such as MRI segmentation and normative modeling of population level variation and individual deviations. However, learning robust and generalizable representations becomes particularly challenging in the presence of missing modalities and heterogeneous data distributions. Most existing methods address this challenge primarily from a statistical perspective, yet they lack a theoretical understanding of the underlying geometric behavior such as how probability mass is allocated across modalities. In this paper, we introduce a generalized geometric perspective for multimodal representation learning grounded in the concept of barycenters, which unifies a broad class of existing methods under a common theoretical perspective. Building on this barycentric formulation, we propose a novel approach that leverages generalized Wasserstein barycenters with hierarchical modality specific priors to better preserve the geometry of unimodal distributions and enhance representation quality. We evaluated our framework on two key multimodal tasks brain tumor MRI segmentation and normative modeling demonstrating consistent improvements over a variety of multimodal approaches. Our results highlight the potential of scalable, theoretically grounded approaches to advance robust and generalizable representation learning in medical imaging applications.
Pruckner, P.; Mito, R.; Vaughan, D. N.; Schilling, K. G.; Morgan, V. L.; Englot, D. J.; Smith, R. E.
Show abstract
Longitudinal probing of structural connectivity via diffusion magnetic resonance imaging (dMRI) is experiencing uptake. However, the detection of biological effects is significantly hampered by the limitations of cross-sectional streamline tractography, where even small changes in the dMRI signal can produce drastically different trajectories and therefore quantitative parameterisation; if not properly dealt with, such effects will manifest as spurious longitudinal change, which can obscure subtle biological differences. To overcome this challenge, we here introduce a novel quantitative streamline tractography framework tailored for longitudinal analysis, wherein an individuals streamline trajectories remain fixed throughout the analysis, allowing only their ascribed density weights to vary between sessions. We present two strategies by which these quantitative streamline weights can be determined, both extensions of the widely adopted SIFT2 method. The performance of this framework is benchmarked against cross-sectional reconstruction with and without SIFT2 optimisation, in both in silico dMRI phantoms with known ground truths and three distinct human in vivo cohorts with clear a priori expectations of biological effects. We demonstrate that the proposed framework drastically reduces methodological imprecisions in synthetic dMRI phantoms and enhances statistical sensitivity and specificity to biological effects in human cohorts, enabling robust longitudinal quantification of structural connectivity.
Taherkhani, M.; Pizzolato, M.; Morup, M.; Dyrby, T. B.
Show abstract
Diffusion-weighted magnetic resonance imaging (dMRI) is used to study white matter microstructure and to delineate pathways by estimating fiber orientation distributions (FODs). Symmetric FODs represent the conventional model assuming antipodal symmetry in water diffusion. However, in complex regions with bending, branching or fanning fibers, this assumption is not guaranteed. To better capture such underlying fibers geometries, asymmetric FODs (A-FODs), derived from neighboring FODs, have been introduced. Here, we propose an Encoder-based Curvature-Aware Regularization (EnCAR) method for estimating A-FODs. Incorporating curvature features into the regularization weight applied to neighboring voxels improves reconstruction of A-FODs. A self-supervised Transformer network, combined with a Spherical Harmonics Semantic Encoder, learns region-specific regularization parameters from this local neighborhood to capture the diversity of fiber geometries across the brain. The EnCAR method was verified on the DiSCo challenge phantom, and applied to in vivo multi-shell Human data. The model estimated sharp, high-angular-resolution A-FODs that were well aligned with local fiber pathway. Compared with established FOD and A-FOD methods, it performed on par in regions dominated by symmetric FODs and outperformed them in complex asymmetric regions. Quantitative evaluation using the Asymmetry Index (ASI) and Model Discrepancy Index (MDI) confirmed improved consistency with the underlying diffusion signals. By ensuring smooth directional transitions, this work enhances the visibility of continuous fiber segments.
Ajadi, N. A.; Afolabi, S. O.; Adenekan, I. O.; Jimoh, A. O.; Ajayi, A. O.; Adeniran, T. A.; Adepoju, G. D.; Hassan, N. F.; Ajadi, S. A.
Show abstract
This research presents multimodal deep learning for structural heart disease prediction. We evaluated multiple deep learning architectures, including TCN, Simple CNN, ResNet1d18, Light transformer and Hybrid model. The models were examined across the three seeds to ensure robustness, and bootstrap confidence interval is used to measure performance differences. TCN consistently outperforms other competing architectures, achieving statistically significant improvements with stable performance across runs. Similarly in predictive analysis, TCN has efficient computation and stable training compared to all competing architectures. Our results show that TCN emphasizes fairness evaluation when developing deep learning models for healthcare applications.
Kharade, A.; PAN, Y.; Andreescu, C.; Karim, H. T.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWMachine learning models using functional magnetic resonance imaging (fMRI) are becoming increasingly popular - these models often rely on training data from multiple, large, and publicly available datasets. It is often necessary to harmonize these data across sites and sequences, and algorithms like ComBat are frequently applied to correct for these differences. This has been shown to improve model performance and generalizability. However, applying traditional ComBat necessitates harmonizing all data (train, validation, test, and other unseen external test sets) simultaneously, which leads to potential data leakage and limits application to new unseen data. We introduce Consistent Reference External Batch (CREB) harmonization, a novel extension of ComBat that learns the prior distribution of site effects exclusively from a designated training set. This learned prior serves as a consistent, easily deployable reference point that employs the empirical Bayes framework to update the site effect for any new, external unseen data. This approach enables training, validation, and test sets to be harmonized separately, thereby preventing data leakage, ensuring the integrity of downstream analyses, and application to new unseen data. CREB is different from traditional ComBat in which each sites prior distribution is estimated at once, but this cannot be applied to unseen data or data from sites not included in the original set of data. We tested CREB with train data from 2846 participants (ages 18-97 years) across 9 different studies and test data from 1113 participants (ages 18-88 years) from 3 studies. We evaluated the performance of harmonization with functional connectivity and gray matter volume. We show that CREB can effectively harmonize the test data to the train data, and have comparable performance to ComBat. CREB is able to conduct this harmonization in a two-step procedure that prevents leakage and is deployable to new unseen data. Finally, we tested whether CREB could similarly preserve biological variance (e.g., whether age associations were preserved after harmonization). We found that CREB, like ComBat could preserve age associations with both functional connectivity and gray matter volume measures. CREB provides an easily deployable, robust harmonization method to standardize data to a common reference distribution, making it uniquely suitable for training generalizable machine learning models.
Barkhau, C. B. C.; Mahjoory, K.; Brenner, M.; Weber, E.; Leenings, R.; Pellengahr, C.; Winter, N. R.; Konowski, M.; Straeten, T.; Meinert, S.; Leehr, E. J.; Flinkenfluegel, K.; Borgers, T.; Grotegerd, D.; Meinert, H.; Hubbert, J.; Jurishka, C.; Krieger, J.; Ringels, W.; Stein, F.; Thomas-Odenthal, F.; Usemann, P.; Teutenberg, L.; Nenadic, I.; Straube, B.; Alexander, N.; Jansen, A.; Jamalabadi, H.; Kircher, T.; Junghoefer, M.; Dannlowski, U.; Hahn, T.
Show abstract
Modeling individual brain dynamics from resting-state fMRI (rs-fMRI) remains challenging due to substantial inter-subject variability, measurement noise, and limited data length per subject. Here, we systematically evaluate a hierarchical dynamical systems framework based on shallow piecewise-linear recurrent neural networks (shPLRNNs) for individualized modeling of rs-fMRI data, with a particular focus on reproducing subject-specific functional connectivity (FC). We applied the framework to 1,423 rs-fMRI samples from healthy participants of the Marburg-Munster Affective Disorders Cohort Study (MACS). Simulated rs-fMRI data robustly reproduced empirical FC patterns, with comparable reconstruction accuracy on training and independent validation sets. Generalization to unseen individuals was heterogeneous and strongly depended on how typical a subjects connectivity pattern was relative to the training cohort, with template similarity explaining 37% of variance in reconstruction accuracy. Learned subject-specific parameters exhibited significant test-retest stability and higher within-subject than between-subject similarity on longitudinal data from two different timepoints, supporting their interpretation as individualized dynamical markers. Associations between individual parameters and demographic or cognitive variables were statistically significant but modest in effect size, and predictive performance remained below that obtained using empirical rs-fMRI features directly. Together, these results demonstrate that hierarchical shPLRNNs can extract meaningful and stable individual-specific dynamical structure from rs-fMRI data, while highlighting current limitations in capturing fine-grained individual differences. The findings delineate key trade-offs between model expressivity, generalization and subject specificity, and point to directions for future methodological refinement in individualized brain modeling.
Marques dos Santos, J. D.; Ramos, M. B.; Reis, L. P.; Marques dos Santos, J. P.; Direito, B.
Show abstract
The application of artificial intelligence (AI) to functional magnetic resonance imaging (fMRI) has gained increasing attention due to its ability to model complex, high-dimensional brain data and capture nonlinear patterns of neural activity. However, deep learning architectures, such as Graph Neural Networks (GNNs), typically require large sample sizes to achieve stable convergence, limiting their applicability in neuroimaging contexts where data are often scarce. This challenge highlights the need for compact, data-efficient models that maintain predictive performance and interpretability. Shallow neural networks (SNNs) have demonstrated robustness in low-sample settings but commonly rely on region-level features that treat brain areas independently, overlooking the brains intrinsically network-based organization. To address this limitation, we propose a structurally constrained message-passing framework that integrates diffusion tensor imaging (DTI)-derived structural connectivity with region-level fMRI signals within a shallow architecture. This approach enables network-level modeling while preserving the stability and data efficiency of SNNs. The method is evaluated on 30 subjects performing a Theory of Mind (ToM) task from the Human Connectome Project Young Adult dataset. A baseline SNN achieved global accuracies of 88.2% (fully connected), 80.0% (pruned), and 84.7% (retrained), while the proposed model achieved 87.1%, 77.6%, and 84.7%, respectively. Although structural constraints led to a more pronounced performance decrease after pruning, retraining restored accuracy to baseline levels, demonstrating that biological constraints can be incorporated without compromising predictive validity. Model interpretability was assessed using SHAP (Shapley Additive Explanations). While the baseline model primarily identified isolated regions as key contributors, the proposed framework revealed distributed, structurally coherent networks as the main drivers of classification. These networks showed correspondence with established ToM regions, including the temporo-parietal junction, superior temporal sulcus, and inferior frontal gyrus. Importantly, the findings suggest that groups of moderately informative regions can collectively form highly relevant subnetworks. Overall, the proposed framework achieves competitive performance in a limited dataset while incorporating graph-inspired message passing into a shallow architecture. Its explainability provides insight into how structurally constrained networks support stimulus-driven responses in ToM and demonstrates potential for investigating network dysfunction in disorders such as Alzheimers disease, ADHD, autism spectrum disorder, bipolar disorder, mild cognitive impairment, and schizophrenia.
Wei, Y.; Smith, S. M.; Gohil, C.; Huang, R.; Griffin, B.; Cho, S.; Adaszewski, S.; Fraessle, S.; Woolrich, M. W.; Farahibozorg, S.-R.
Show abstract
Dynamic functional connectivity (dFC) models have become increasingly popular over the past decade for characterising time-varying interactions between brain regions. However, assessing and comparing dFC models remains challenging. Here, we introduce bi-cross-validation as a general framework for evaluating dFC models and selecting key hyperparameters, such as the number of states. By jointly partitioning the data across subjects and brain regions, bi-cross-validation enables out-of-sample evaluation without re-estimating latent states on the same data used for testing, thereby avoiding circularity. Using simulated data with known ground-truth dynamics, we show that bi-cross-validation favours models that accurately capture the underlying state structure. Applying the framework to real resting-state fMRI data, we demonstrate that bi-cross-validation naturally balances goodness-of-fit against model complexity, with performance improving and then declining as model complexity increases. Finally, we use bi-cross-validation to directly compare static and dynamic FC models, showing that dynamic models underperform static models at low spatial dimensionality, but outperform static models at sufficiently high dimensionality. Together, these results establish bi-cross-validation as a principled tool for dFC model selection, evaluation, and comparison.
Leyva, A.; Niazi, M. K. K.
Show abstract
There have been no systematic evaluations of purely spectral models for digital pathology tasks. We implemented and benchmarked four pipelines: binary classification on the BreaKHis dataset, multi-class region classification in glioblastoma, spatial transcriptomics, and denoising on Visium 10x. Across all tasks, extensive cross-validation and grouped splits showed that purely spectral models did not improve performance over CNN-only baselines, but offer useful complementary tools for interpretability and processing. Denoising showed strong performance that proves utility in data-scarce or heterogeneous image environments. Equivalence testing confirms that spectral and CNN model performances fall outside {+/-}3% AUC. Fusion models between CNNs and spectral models show higher balanced accuracy. Spectral models failed to generalize across spatial transcriptomics tasks, with low correlation despite stable training loss. These findings represent a systematic negative result: despite their theoretical richness, spectral geometric features and SNO embeddings prove to be complementary features for WSI classification or segmentation. Reporting such outcomes is essential to establish empirical boundaries for spectral methods and to encourage future work on conditions or data modalities where these approaches may hold greater promise.
Avaria-Saldias, R. H.; Ortiz, D.; Palma-Espinosa, J.; Cancino, A.; Cox, P.; Salas, R.; Chabert, S.
Show abstract
Accurate characterisation of the haemodynamic response function (HRF) is central to interpreting blood-oxygen-level-dependent (BOLD) signals in functional magnetic resonance imaging, yet standard estimation approaches remain centred around phenomenological formulations lacking biophysical grounding. We present a physics-informed neural network (PINN) framework that bridges these paradigms by embedding the Balloon-Windkessel model directly into the training objective of a multi-headed Neural Network. Our aproach simultaneously estimates probable latent neurovascular state variables such as cerebral blood inflow, metabolic rate of oxygen consumption, blood volume, and deoxyhaemoglobin content, through an indirect optimisation scheme in which the predicted BOLD signal is obtained via convolution of the estimated HRF with experimental stimuli. Training is governed by a composite loss, balancing differential-equation residuals, physiological initial conditions and data fidelity. In simulations with temporal signal-to-noise ratios representative of clinical acquisitions, the framework recovered ground-truth state variables with coefficients of determination exceeding 0.99 and mean squared errors below 10-3, at a physics-to-data weighting of 0.40:0.60. Application to 1.5 T block-design fMRI data from an ischaemic stroke patient yielded physiologically plausible, subject-specific HRF estimates, establishing feasibility of single-subject, physics-constrained HRF inference without reliance on fixed gamma basis assumptions.To our knowledge, this constitutes the first deployment of a single PINN incorporating the full Balloon-Windkessel model within an indirect training objective, reconstructing full BOLD observations, positioning PINN-based haemodynamic modelling as a principled and personalised route towards more interpretable and patient-specific fMRI biomarkers.
Chandio, B. Q.; Feng, Y.; Ba Gari, I.; Alibrando, J. D.; Thomopoulos, S. I.; Villalon-Reina, J. E.; Liou, K.; Somu, S.; Yoo, H.; Nir, T. M.; Garyfallidis, E.; Luders, E.; Yeh, F.-C.; Jahanshad, N.; Thompson, P. M.
Show abstract
White-matter hemispheric asymmetry is a fundamental property of human brain organization and is known to change in aging, neurodevelopment, and neurodegenerative disorders. Tractometry analyzes diffusion-derived microstructural measures along the full length of tracts, localizing changes to specific tract-segments rather than collapsing tracts into a single value. Yet, existing frameworks lack a principled way to quantify left-right hemispheric asymmetries along homologous tracts. Here, we introduce an asymmetry-aware tractometry framework that integrates a symmetric white-matter atlas with BUAN (Bundle Analytics) to enable anatomically consistent, along-tract comparison of homologous pathways. By defining homologous bundles with a shared template and consistent orientation, each left-hemisphere segment is directly matched to its right-hemisphere counterpart, enabling principled, segmentwise comparison and revealing spatially localized asymmetries along-tract. Applying this framework to diffusion MRI data from the Alzheimers Disease Neuroimaging Initiative (ADNI) comprising 1,215 subjects, we demonstrate how this approach reveals systematic left-right asymmetries across major white-matter pathways and show how these patterns differentiate cognitively normal (CN) individuals from those with mild cognitive impairment (MCI) and dementia. This method provides a sensitive and anatomically grounded tool for studying hemispheric specialization and its disruption in aging and disease, and establishes a general approach for asymmetry-aware tractometry in population neuroimaging studies.
Hakata, Y.; Oikawa, M.; Fujisawa, S.
Show abstract
Background. Federated learning (FL) enables collaborative model training across institutions without sharing patient-level data. However, standard FL algorithms such as FedAvg degrade under non-independently and non-identically distributed (non-IID) data, a prevalent condition when patient demographics, scanner hardware, and disease prevalence differ across hospital sites. Objective. We propose iPS-MFFL (Individualized Per-Site Meta-Federated Feature Learning), a federated framework with a hierarchical local-model architecture that addresses non-IID heterogeneity through (1) a shared feature extractor, (2) multiple weak-learner classification heads that can be trained with heterogeneous training objectives to promote complementary decision boundaries, (3) independent per-learner server aggregation so that each weak learner's parameters are averaged only with its counterparts at other clients, and (4) a lightweight meta-model, itself federated, that adaptively stacks the weak-learner outputs. Methods. We evaluate on the Brain Tumor MRI Classification dataset (7,200 images; 4 classes: glioma, meningioma, pituitary tumor, no tumor) partitioned across K = 5 simulated hospital sites using Dirichlet non-IID sampling (alpha = 0.3). Four baselines are compared: Local-only training, FedAvg, FedProx, and Freeze-FT. All experiments are repeated over three random seeds (13, 42, 2025) and evaluated using paired t-tests, Cohen's d effect sizes, and post-hoc power analysis.
KOM SANDE, S. D.; Skorski, M.; Theobald, M.; Schneider, J.; Marz, W.
Show abstract
Cardiovascular diseases (CVDs) remain the foremost cause of global morbidity and mortality, driving an urgent need for robust predictive tools that enable early detection and preventive intervention. Traditional regression-based models--such as linear and logistic regression, regression trees and forests, and Support Vector Machines (SVMs)--have long underpinned CVD risk estimation but often assume linear relationships, homogeneous effects across populations, and a limited number of predictors. Recent advances in regression, such as bagging and boosting, as well as Generative Artificial Intelligence (GenAI) and Large Language Models (LLMs) are increasingly shifting this paradigm. In this paper, we review key developments in the context of both classic regression techniques and recent GenAI approaches, and we put a particular focus on openly available Medical LLMs (MedLLMs) in combination with few-shot prompting and classification finetuning. Based on the LURIC cardiovascular health study, we investigate a broad variety of biomarkers and risk factors under two different cohorts of 3,316 CVD risk patients who underwent coronary angiography in Germany between 1997 and 2000. Our results demonstrate that large, pretrained MedLLMs (70B) achieve up to 82% AUROC for 1-year all-cause mortality (1YM) prediction with optimized few-shot prompting, thus performing competitively with recent regression techniques and state-of-the-art methods from the medical literature such as CoroPredict, SMART and SCORE2. Smaller models (8B) can be finetuned to match or even surpass their larger counterparts as well as commercial models like ClaudeSonnet-4.5 and ChatGPT-5.2. Among all evaluated approaches, the best-performing boosting-based regression technique (CatBoost) and commercial LLM (Gemini-3-Flash) both achieve an AUROC of up to 85%. Further model-calibration and -stratification analyses reveal a systematic mortality over-prediction (ECE: 0.05-0.10) of MedLLMs, while Platt scaling effectively reduces such miscalibrations by 60-90%.
Hakata, Y.; Oikawa, M.; Fujisawa, S.
Show abstract
Background. Adult diffuse glioma is a representative class of primary brain tumors for which accurate MRI-based tumor segmentation is indispensable for treatment planning. Conventional automated segmentation methods have relied primarily on image information and spatial prompts, and auxiliary clinical information that is routinely acquired in clinical practice has not been sufficiently exploited as an input. Objective. Building on a dual-prompt-driven Segment Anything Model (SAM) extension framework that fuses visual and language reference prompts, we propose a method that integrates patient demographics, unsupervised molecular cluster variables derived from TCGA high-throughput profiling, and histopathological parameters as learnable prompt embeddings, and we evaluate its effect on the accuracy of lower-grade glioma (LGG) MRI segmentation. Methods. An auxiliary prompt encoder converts clinical metadata into high-dimensional embeddings that are fused with the prompt representations of Segment Anything Model (SAM) ViT-B through a cross-attention fusion mechanism. The TCGA-LGG MRI Segmentation dataset (Kaggle release by Buda et al.; n = 110 patients; WHO grade II-III) was split at the patient level (train/val/test = 71/17/22) using three different random seeds, and the three slices with the largest tumor area were extracted from each patient. To avoid pseudo-replication arising from multiple slices per patient and repeated measurements across seeds, our primary analysis aggregated Dice and 95th-percentile Hausdorff distance (HD95) to the patient x seed unit (n = 66); secondary analyses at the unique-patient level (n = 22) and at the per-slice level (n = 198) are also reported. Pairwise comparisons used paired t-tests with Bonferroni correction (k = 3) and Wilcoxon signed-rank tests, and a permutation test (K = 30) served as an auxiliary check of effective use of the auxiliary information. Results. At the patient x seed level (n = 66), Proposed (full clinical) achieved a Dice gain of +0.287 over the zero-shot SAM ViT-B baseline (paired-t p = 4.2 x 10^-15, Cohen's d_z = +1.25, Bonferroni-corrected p << 0.001; Wilcoxon p = 2.0 x 10^-10), and HD95 improved from 218.2 to 64.6. Because zero-shot SAM is not designed for domain-specific medical segmentation, the large absolute HD95 gap largely reflects the expected domain gap rather than a competitive baseline. The additional contribution of the full clinical configuration over the demographics-only configuration was Dice = +0.023 (paired-t p = 0.057, Bonferroni-corrected p = 0.172), which did not reach statistical significance at the patient level and is reported as a directional trend. The permutation test (K = 30, seed 2025) yielded real-metadata Dice = 0.819 versus a shuffled-metadata mean of 0.773, giving an empirical p = 0.032 = 1/(K + 1), which is at the resolution limit of this test and should therefore be interpreted as preliminary evidence. Conclusions. Integrating auxiliary clinical information as multimodal prompts produced a large improvement over the zero-shot SAM baseline on this LGG cohort. More importantly, a robustness analysis showed that Proposed (full clinical) outperformed the trained Base (no auxiliary information) under all tested spatial-prompt conditions, including perfect centroid (+0.014), and that the advantage was most pronounced in the prompt-free regime (+0.231, p = 0.039), where the base model collapsed but the proposed model maintained meaningful segmentation by leveraging clinical metadata alone. The additional contribution of molecular and histopathological information beyond demographics was not statistically resolved at the patient level (+0.023, n.s.). Establishing clinical utility will require external validation on larger multi-center cohorts and direct comparisons with established segmentation methods. Keywords: brain tumor segmentation; Segment Anything Model (SAM); vision-language prompt-driven segmentation; auxiliary clinical prompts; multimodal learning; TCGA-LGG; deep learning